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Abstract. 

There are circumstances under which stratified sampling is worse than simple 
random sampling, even if the allocation of the sample sizes is optimal. This 
phenomenon was discovered more than sixty years ago, but is not as widely known 
as one might expect. We provide it with lower and upper bounds for its badness as 
well as with an explanation. 
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1. PROLOGUE 

'Stratification is a common technique, ' often necessary, but also attractive 
because it 'may produce a gain in precision in the estimates of characteristics 
of the whole population' (Cochran (1963), §5.1). In fact, if all sampling is done 
vi^ith replacement and the sample sizes are proportional to the strata sizes, 
stratified random sampling is at least as precise as simple random sampling. 
It even approaches perfection as the homogeneity inside the strata increases, 
i.e., as the heterogeneity of the population is more reflected by the hetero- 
geneity between the strata and less by the heterogeneity inside the strata. 
Consequently, the rule of thumb with respect to stratified sampling is that it 
doesn't hurt to try. 

In real life, however, sampling is without replacement (cf. Cochran (1963), 
§2.1: 'Sampling with replacement is entirely feasible but except in special cir- 
cumstances is seldom used, since there seems little point in having the same 
unit twice in the sample') — and without replacement, the rule of thumb is 
no longer valid. Even optimal stratified sampling may hurt then, in that the 
corresponding estimator can have a larger variance than the estimator based 
on a simple random sample (cf. Armitage (1947), Cochran (1963) §5.6, Evans 
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(1951), and Govindarajulu (1999) §5.5; for the obscurity of this fact, please 
see, for instance, the same Govindarajulu (1999) §5.5 and Wilks (1963), §10.9). 

The intuition might be helped here by realizing that, as equalities (1) below 
remind us, not replacing yields an advantage that is zero for a sample size 
of 1 and increases with the sample size. Thus, the advantage is larger for 
one sample of size n > 1 than for n samples of size 1; cf. the illustration of 
Theorem 3 in §2. 

Wc only consider the simplest possible case, that of a dichotomous population. 
For this case, the results in Armitage (1947), Cochran (1963), and Evans 
(1951) are extended to what looks like a quite complete picture. The simple- 
is-bcttcr effect shows up in more circumstances than previously thought and 
is provided with exact bounds for its size. 



2. LOWER AND UPPER BOUNDS 

Consider an urn containing A'' balls, of which pN are red and (1 — p)N are 
black for a p e [0,1]. We want to estimate p. One approach is to take a sample 
of n balls from the urn and estimate p by the fraction of the red balls in the 
sample. A sample of n balls with replacement is a random member (6i, . . . , 6„) 
of (urn)", where all outcomes are equally likely; a sample of n balls without 
replacement is the same, except that the 6i, . . . , 6„ are all different. Let X 
and Y denote the number of red balls in a sample of size n with and without 
replacement, respectively. Then for the fraction estimators X/n and Y/n for p 
the truth of 

X p{l-p) N-1 Y 

var — = = — var— (1) 

n n N — n n 

is well known; it is better never to see the same ball twice. 

Now suppose the urn consists of m > 2 disjoint sub- urns, strata, each stratum^ 
containing Nj > 2 balls, J2^=i = ^^'^ that for each stratum^- we know 
Nj/N but not its fraction pj of red balls. Let (ni, . . . , rim) be an allocation, 
i.e., the nj are natural numbers with I < Uj < Nj for all j and % ^ "-i 

and let Xj, Yj, j = 1 . . . , m, denote the number of red balls in a sample of 
size Uj from stratum^- with and without replacement, respectively; then each 
of 



X 

n 



(the simple estimator with replacement). 



" N- X- 

— ^ — - (the stratification estimator with replacement). 



Y 



n 



(the simple estimator without replacement). 



" N- Y 

and — ^ — (the stratification estimator without replacement) 
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is an unbiased estimator for p: for their optimahty, cf. Ncyman (1934). For 
both with and without replacement we want to compare the variance of 
the simple estimator to that of the stratification estimator, i.e., var(X/n) 

to yaJ:{i2%iiNj/N)Xj/nj) and var(r/n) to var(X;jli (iVj/iV)Y,-/nj), under 
the assumption that the Xj are independent as well as the Yj. 

It is immediate that if the strata are homogeneous but the whole popula- 
tion is not, i.e., p £ (0,1) and each pj is equal to or 1, then the strat- 
ification estimators are perfect while, with replacement, the simple estima- 
tor is not, and, without replacement, the simple estimator is only perfect 
when it is exhaustive, so that = var(^™^^(A/'j/A'')Xj/nj) < var(X/n) and 
= ya.T(J2T=ii^j/^)Yj/nj) < var(y/n). 

For arbitrary pj, the stratification estimator Yl^=i{-^jl^)^3l^j is still not 
worse than the simple estimator X/ n as long as the allocation is proportional, 
i.e., Uj = {Nj/N)n for every j, because in that case 

™ N- X X ™ jiq . 
var V -f ^ = var V icf ipj - pf (2) 

holds, as one easily verifies. Thus, if all sampling is with replacement and 
the allocation is proportional, then stratified sampling is seen to reduce the 
variance, unless all the pj are equal. And where it doesn't help, it doesn't 
harm either. However, this reassurance no longer holds as soon as we change 
the allocation: 

Theorem 1. //pi • • • = Pm = P S (0, 1) and the allocation is not propor- 
tional, then simple is better in that 



^ N, X, X 
var > — f — > var— , 
iV n,- n 

i.e., the variance of the stratification estimator with replacement is greater 
than the variance of the simple estimator with replacement. 

Nor does it hold if all samples are drawn without replacement: 

Theorem 2. If p^ = ■ ■■ = p^ = p e (0, 1) and n < N, then simple is better 
in that 

~" y 

N Hj n 



var — f — > var— , (3) 



i.e., the variance of the stratification estimMor without replacement is greater 
than the variance of the simple estimator without replacement. 
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This is (vi&) in Armitagc (1947) for a dichotomous population, i.e., for the 
case where Armitage's 'variable x' has only 2 different values, except that we 
do not need his condition that (in the dichotomous case) all the Nj are equal. 
It is also the dichotomous case of what is proved at the end of §5.6 in Cochran 
(1963), except that our condition pj = p is replaced by the condition that all 
the 'mean square [errors] within strata' pj{l — pj)Nj/{Nj — 1) are equal and 
larger than the 'mean square [error] among strata' Yl'JLi ^j{Pj —p)'^/{m — 1). 
In Hajek (1981), the observation after (20.31) that simple is better if pj = p 
only refers to proportional allocation, not necessarily to all allocations. 

Under additional conditions inequality (3) may be sharpened: 

Theorem 3. Let 

N -I Y N -n pil-p) 

^ '■= iTr var— = — -. 

N — m n N — m n 

If Pi = • • • = = P e (0, 1), n < TV, and 
(cl) Uj < jNj for j = 1,..., m, or 
(c2) the allocation is proportional, or 
(c3) Ni=N2 = ... = Nm = N/m, 
then 

var V-f-^ > S, 

.7 = 1 

i.e., the variance of the stratification estimator without replacement is at least 
{N — 1) / {N — m)x the variance of the simple estimator without replacement, 
and 

TV- Y 

var ^ — ^ — = B rij = n/m, Nj = N/m Vj. 

7 = 1 J 



Theorem 3(c3) follows from Evans (1951) (12a, c). 

A special case will illustrate Theorem 3 and bring out a weak point of the 
stratification estimator; the factor (A'' — 1)/{N — m) in S appearing here was 
met in (1) (take n = m). Imagine that all strata not only have the same 
composition, i.e., pj = p, but also the same size, i.e., Nj = N/m, and that 
from each stratum only one ball is taken, so rij = 1 and n = m. Then 
J2^=ii^j/^)Yj/nj = (1/n) X]j'=i ^> which is distributed as X/n, so that 
with (1) 

^NjYj X N-1 Y ^ 

^ N Tii n N — n n 

j=i ■> 

Splitting the sample over the strata reduces the without-replacement bonus 
from (1). 
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The need for extra conditions in Theorem 3 such as (cl), (c2), or (c3), and 
the room there is for Theorem 2 are demonstrated by considering m = Ni = 
m = 2, n = iV - 1, and N > 5. 

Further, one may ask if the condition 'pi = • • • = is only the beginning: 
are there other distributions of the red balls among the strata, for which 
there are theorems similar to Theorem 3 but with lower bounds that are even 
higher than B in Theorem 3? The answer is 'no', as long as Nj := N/m and 
rij — n/m are feasible choices, because for these choices B is an upper bound 
(over varying distributions, given N, n, m, and p) for the variance of the 
stratification estimator Y^™^i {Nj/N)Yj/nj (corresponding to pi = • • • = Pm', 
cf. Theorem 3): 



Theorem 4. // Nj = N/m > 2, Uj = n/m for all j, and n < N with B as 
in Theorem 3, then 

i.e., the variance oj the stratification estimator without replacement is at most 
[N — 1)/{N — m)x the variance of the simple estimator without replacement, 
and 



Cf. 'the worst result to be anticipated' on p. 99 of Evans (1951). (Namely, 'for 
a second' variable of interest; strata that are good, i.e., different, with respect 

to the first variable of interest need not be so for a second.) In the situation 
of Theorem 4, the worst is not that bad: in practice, {N — 1)/{N — m) will 
be close to 1, and it also follows that if for every stratum the sample size is 
increased by 1, the new variance will not exceed the simple random sample 
variance corresponding to the old sample size. 

Theorem 4 shows that if we want to curb the badness of stratified sampling 
and proportional allocation is an option, then it works, at least for strata of 
the same size. This makes us realize that it always works, even for arbitrary 
strata sizes, because independence, (1), and (2) imply 



var 



\ ^^<var> < -^^ 

N n.j ~ N Hj ~ n 



under proportional allocation. Our final results essentially show how the upper 
bound p{l — p)/n for var {Nj/N)Yj/nj can be improved. 

Theorem 5. // {Nj/N)n and pNj are integers and < pNj < Nj for all j. 
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then 



^ TV,- Yi 

B < mm max var > — ^ — 

Statistician Nature ^ — ' N Tlj 

j=l J 



< B+ Y ^-rn^i (4) 



where 'Statistician' means an allocation (ni, . . . ,nm) that satisfies uj < jNj 
for all j or is proportional, 'Nature' means a distribution (pi, . . . ,Pm) of the 
pN red balls among the strata (so pj G [0, 1], J2^^iPj^j — P^' and pjNj e 
{0, 1, . . . , iVj}^, and B is as in Theorem 3; in fact, maxNature does not exceed 
the upper bound in (4) if the allocation is proportional. 
If n< {i/4)N, then the upper bound in (4) does not exceed p{l —p)/n. 

For the lower bound we observe that it follows from Theorem 3 and that, also 
by Theorem 3, if not Nj = N/m for all j, then 'B <' may be replaced by 
'B <'. The difference between upper and lower bound is bounded by 1/4 A'' 
because Yl%i (N - mNj)/{Nj - 1) < YJ^=i N/{2 - 1) = mN; it reduces to 
in case all strata have the same size (cf. Theorem 4). 

The circumstances under which stratified sampling will hurt, have been called 
'very unusual' and 'extreme' {Eybjos, 1951), 'an academic curiosity', which will 
happen only 'mathematically' (Cochran, 1963), as well as 'quite conceivable' 
(Govindarajulu, 1999). 



3. JUSTIFICATIONS 



Proof of Theorem 1. By Jensen's inequality we obtain 



var 



N X m , 

E§5 = ^C-riE 



N, 



_ ^ n,N/N, N 
1 

> P(l-P) ^jy, 
^i=l Nj N 

Pjl-P) X 

= = var — , 

n n 



with '=' instead of '>' if and only if Uj/Nj is constant, i.e., the allocation is 
proportional. 



Proof of Theorem 2. Let m > 2, 1 < nj < Nj, Nj > 2, j = 1, ... ,m be 
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integers with A'' = X^^i Nj, n = X^^i nj < N. In order to prove 

V ^1 Nj-n, N-n 

^ Ni-1 m N-1 n ' ^ ' 



it suffices to prove it for m = 2. Indeed, by splitting off one stratum from the 
urn at a time, applying (5) with m = 2 each time, and observing that one 
still has '>' instead of '>' if n = iV, one obtains (5) for the general case. For 
l<k<K and 1<£< L, K,L>1, k + i<K + L we will prove 



K+L-l\k+i J K-l\k J L-l\i 

To this end we rewrite the LHS of (6) as 5 + T with 

K + L ( {K + Lf _K^_ll\^ ( K + L 
K+L-l\ k+l k l) \K+L-1 



U, 



K + L-1 \K + L-1 K-lJ k K-l 



K + L-1 L-1) I L-1 



Note that dU/dk = -{K + Lfl{k + if + K'^/k'^ > iff fc < {K/L)L Conse- 
quently, U is maximal for k = {KIL)l and 

Clearly, T is strictly increasing in k and I and hence 
which completes the proof. 

Proof of Theorem 3. The statements corresponding to (c2) and (c3) follow 
straightforwardly from the fact that if terms tj > have sum Y^™^i tj = t, 
then for all aj > 0, not every aj = 0, we have 

m m / I \ 2m/ n \ ^ ^ f m \ 
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with instead of '>' if and only if tj 
Schwarz. With aj = Nj and tj = Nj — 1 we obtain 



n 



3=1 



N 



> 



*\/aJ/ Eili by Cauchy- 
)tain 

pil-p){N- n) 

p{l-p){N 



Nf 



N — 



which proves the statements corresponding to (c2), and with aj = 1 and 



tj — rij we obtain 



E 



1 p(i-p)^ 



m 



> 



iV- 



m? {N — m) 

w?{N — m) 
p{l — p) N — n 
n N — m' 

which proves the statements corresponding to (c3). 

In order to prove the statements corresponding to (cl), we observe that the 
function 



ilj{x,y) 



i-1 

V 



is strictly convex on (0, 1) x (0, |] , because it is strictly convex on any segment 
in (0, 1) X (0, |]. On any segment {t{xi, j) + {l — t){x2, f)}, namely, the strict 
convexity is clear, while for a segment not contained in y = | we have 



-1 



2 



3-4t/ 



The Hessian of ip, therefore, of which the determinant is the product of the 

eigenvalues and the sum of the diagonal elements is the sum of the eigenvalues, 
is positive definite outside y = |, so the second derivative of t € [0, 1] 
V'(t(xi, j/i) + (1 - t){x2,y2)) is positive on (0, 1). 



X 



Consequently, applying Jensen's inequality to the random 2- vector ^ y 
nj/Nj 



■3 e 



{l,...,m} 



e with P{{j}) = Nj/N, j = l,...,m, gives 
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with '=' instead of '>' if and only ifl/Nj and nj/Nj are constant. This proves 
the statements corresponding to condition (cl). 

Proof of Theorem 4- Under rij = n/m, Nj = N/m the variance of the 
stratification estimator becomes 



1 /I \ N n 

1 Pj{^-Pj) m ~ m 

j=l m m~ '- 



and J2Pj^j = becomes X^Pj = mp, while by Cauchy-Schwarz 
^Pji} - Pi) =mp- ^^^^ <mp-^ C^'J^J • ^)) = "^^(^ ~ P^- 



This proves Theorem 4. If, in the situation of Theorem 4, for every stratum 
the sample size is increased by 1, we have rinew = n + m and the new variance 
will not exceed 

N — m — np{l — p) N — n p{l — p) 

Onew — 77 ; ^ "77 Z • 

N — m n + m TV — 1 n 

Rest of Proof of Theorem 5. For the upper bound, let cxj,(ij, 1 < .j < rn, be 
positive reals with X^JLi /^j = 1- By the multiplier method of Lagrange we see 
that X^Jli ^jPji^ ^Pj) attains its maximum over pj under the side condition 
J2T^iPj{Pj -P) =0 at 

with maximum value equal to 




With 

N- 

this shows that under proportional allocation, rij = {Nj/N)n, 

^N.Y, N-n ( N~m^ Nf ^ ^„ , 

maxvarV ^-^<— — ^> (2p-l)M (7) 

Nature ^ N Uj ' 4(iV - m)n \ ^ Nj-1 ^ ^ M ^ ^ 
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holds. The right-hand side of (7) is equal to the upper bound in (4). 
Finally, as 

^ ^ N -n p{l-p) 
N — m n ' 

the fact that the upper bound in (4) does not exceed p{l —p)/n is equivalent 
to 

N — n N — mNj ^ n — mp{l — p) 

A{N - m)nN'^ ^ Nj - 1 " N-m n " 

Suppose ATi < TVj < • • • < Nm- As 1 < pN^ < N^-l, we have 

n — m p(l — p) ^ n — m 1 A 1 . 



N-m n ~ {N-m)nNi \ Ni 
Consequently, it is sufficient to prove 



N-mNj 4{N-m)nN^ {n - m){Ni - 1) 
Nj - 1 - N -n {N - m)nNl ' 



whose left-hand side is equal to the left-hand side in 

(remember > iVi), so that it is sufficient to prove 



., , o m{N — mNi) 



m{N -mNi) ( <4(n-m)- 



Ni-l - ^ 'N- 



This is true if 



, N\ f N/n Y ^, , 7V2 

m iV-m— — — - <4(n-m)- 



n y \N/n- Ij ~ ^ ' N -n 
(as A''i > N/n), which is equivalent to 

mN{n — m) < An{N — n){n — m), 
which is true if n < (3/4)Af, because m <n. 
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